home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Gigarom 1
/
Gigarom Macintosh Archives (Quantum Leap)(CDRM1080320)(1993).iso
/
FILES
/
BBS
/
SECOND_SIGHT
/
Find Duplicates.cpt
/
ReadMe
< prev
next >
Wrap
Text File
|
1989-09-20
|
4KB
|
126 lines
Finding Duplicates
------------------
This Stuffit document should contain:
FileTree 4.97 - a shareware disk cataloging utility by Jody S. Kravitz
Plus the following 4 freeware utilities by Mark J. Smith:
FormatTree 0.1 - utility to reorganize FileTree output
SortTree 0.1 - utility to sort reformatted FileTree output
FindExact 0.1 - utility to find "exact" matches
FindOthers 0.1 - utility to find other "suspicious" matches
Here is a brief step-by-step guide to finding duplicate files using
the above utilities.
1. Launch FileTree
2. Using the File menu, create an output file
3. Using the Options menu, configure FileTree to report only
(a) Total File Size
(b) Full Path Names
4. Select a volume to catalog.
5. Launch FormatTree and reformat the FileTree output.
6. Launch SortTree (requires 1.3 MB) and sort the reformatted output.
7. Launch FindExact to search the sorted output for duplicates.
8. Launch FindOthers to search for additional duplicates.
Note: you can use another program or utility to sort the reformatted
output (esp. if memory requirements are a problem) but you will first
need to open the reformatted file and remove the first 5 lines and
last 3 lines of text. SortTree does this for you automatically.
A few words about each of the freeware utilities:
FormatTree 0.1
--------------
This utility reformats output generated by the FileTree program.
The output must contain only 2 columns of information:
(1) the file size in the 1st column
(2) the full pathname in the 2nd column
FormatTree will split this information into 3 columns as follows:
(1) the filename in the 1st column
(2) the file size in the 2nd column
(3) the folder pathname in the 3rd column
SortTree 0.1
------------
This utility sorts the output generated by the FormatTree program.
SortTree ignores the first 5 lines and last 3 lines of the input file.
Otherwise, SortTree is a general purpose Quicksort program that can
be used to sort any text file containing less than 12,000 lines.
If you use another program to sort the output from FormatTree, you
need to remove the first 5 and last 3 lines manually before sorting.
SortTree requires 1.3 MB's of RAM under both Finder and Multifinder.
FindExact 0.1
-------------
This utility searches for exact matches between pairs of adjacent
filenames. For this reason, input into this program must first be
sorted into alphabetical order.
FindExact is unique in that it:
(1) is case insensitive
(2) strips leading, trailing and embedded spaces
(3) strips underscore characters
(4) strips filename extensions
FindExact will find "My File.pit", "my_file.sit" and "MyFile.01" as
exact duplicates.
FindOther 0.1
-------------
This utility searches for high probability matches between pairs of
adjacent file names. For this reason, input into this program must
first be sorted into alphabetical order.
FindOther finds and discards matches detected by FindExact. It then
searches for file names which have 75% or more characters in common.
FindOther can find duplicates like:
'Animation Stack' and 'AnimationStak.sit'
'DeskPict.sit', 'DeskPict1.0' and 'DeskPict_1.1.sit'
'GateKeeper111.sit' and 'Gate_Keeper_1.1.sit'
Note: FindOther will report many more non-duplicates than duplicates.
However it reduces the search space (for you the user) to more
manageable proportions by reporting only suspect cases (those with a
high probability of being duplicates). It's utility lies in it's
ability to identify cases like those illustrated above.
For further information or source code, please contact Mark J. Smith
at one of the following locations:
GEnie: MJMS
BIX: MJMS
MAC-LINK BBS: 514-935-4257 (sysop)
DMI Systems
1028 Greene Ave.
Montreal, QC H3Z 1Z7
CANADA
End of ReadMe.